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GRADIENT METHODS FOR ANAnTIC ROTATION 
ABSTRACT 

Gradient methods are employed in orthogonal and oblique analytic 
rotation. Constraints are imposed on the elements of the transformation 
matrix by means of reparameterisations. 



GRADIENT METHODS FOR ANALYTIC ROTATION 



1. INTRODUCTION 

The analytic rotation of a factor matrix is a problem in optimisation 
subject to constraints. Given a p x m factor matrix, A , we have to 
find an m X m transformation matrix, T , which optimises a function, 
f , of the elements of the rotated factor matrix 

A = AT 

The transformation matrix is required to satisfy certain constraints; 

T'T = I (l) 
in orthogonal rotation, and 

Diag(T'-4'-^' ) = I (2) 

in oblique rotation of the primary factor pattern. 

If the reference structure rather than the primary factor pattern 
is to be rotated obliquely, other constraints are imposed on T . This 
approach, however, has serious disadvantages which have been pointed out 
by Jennrich & Sampson (1966). It will not be considered here. 

Iterative algorithms for optimising a criterion for simple structure, 
f , which operate on pairs of columns of the factor matrix sequentially 
have been successful, both in orthogonal rotation (Kaiser, 1959) and in 
oblique rotation of the factor patten. (Jennrich & Sampson, 1966). Such* 
algorithms do, hwever, have some disadvantages. Hounding errors can 
accumulate during iteration. Also, each step yields a conditional 



optimum of if with respect to one free parameter holding the rest fixed^ a 
process which can sometimes converge slowly, particularly when the number 
of parameters is large (Box et al >, 19^* p* 25). 

Another algorithm for orthogonal rotation which obtains a new T on 
each iteration and does not accumulate rounding error has been proposed 
by Horst (1965, Sections iQ.h, 18.7-2; Mulaik, 1972, Section 10. 5). Little 
is yet known about convergence properties of the algorithm. 

A gradient method with the property of quadratic termination, the 
Fletcher -Powell method, has been employed with considerable success by 
j8reskog (1967, 19^9) maximum likelihood factor analysis. The purpose 
of this paper is to show that gradient methods, such as that of Fletcher 

& Powell (1965), can easily be employed in analytic rotation. Reparameter- 

2 

isations are used, the m elements of T being expressed as functions of 
a smaller number of free parameters. Since a new T is obtained at each 
step of the algorithm, there is no accumulation of rounding error. 

Section 2 reviews some basic results concerning the function to be 
minimised. A reparameterisation for orthogonal rotation and fomulae for 
the gradient are given in Section 5* Corresponding results for oblique 
rotation are given in Section In Section 5 the implementation of the 
procedures is discussed. 

2. CRITERIA FOR SIMPLE STRUCTUFE 

The methods to be given subsequently are general and require only a 
criterion for simple structure to be minimised, f , and a corresponding 
m X m matrix of first derivatives 
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A property of the criterion which will be assumed is that f is invariant 
under interchanges or reflections of columns of T . 

We shall specifically consider a family of criteria for simple 
structure^ dependent on a parameter k (0<k<1), which was proposed 
by Crawford & Ferguson (1970 ): 



pmm^^ mpppp 
f = (1 . k) Z E Z A^Tv^ ^ K Z t t T^lf. 
i=l J=l s^j j=l i=l r^i 

= (1 - k) tr[D^] -i K tr[D^] - tr[A'A^] (5) 



where 



= Diag(AA') , 



= Diag(A'A) , 



This family of criteria is equivalent to the Orthomax family in 
orthogonal rotation (Crawford & Ferguson, 1970, p. Minimising f 

with K = 0 gives the Quart imax solution, k = l/p gives the Varimax 

solution, K = m/(2p) gives the Equamax solution and K = (m-l)/(p + m'-2) 

gives the Parsimax solution. 
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In oblique rotation^ minimisation of f (at least^ when 0 < k < l) 
cannot result in the factor correlation matrix, 

becoming singular (Crawford & Koopman, 1972). This result follows 
immediately from a theorem due to Jennrich & Sampson (1966, Theorem l). 
Minimising f with k = 0 gives the Quartimin solution considered by 
Jennrich & Sampson. 

The m X m matrix of first derivatives of f with respect to the 
elements of T is (Mulaik, 1972, Section 10. 5): 

Z = §1 = '^A' {(1 - k)D^A + kAD^ - a5) . (i,) 

It is of interest to note that f may be computed using 

f = J tr[T'Z] (5) 
which is equivalent to (3). 

3. ORTHOGONAL ROTATION 

In order to impose the m(m + l)/2 constraints in (l) on the m^ 
elements of T we shall express T as a function of 

q = m^ - m(m + l)/2 = m(m - l)/2 (6) 

parameters by means of the Cayley formulas (Gantmacher, 1959, pp. 288-289). 
If T is any orthogonal matrix such that 
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|i + t!/o , (7) 

then there is a nonsingular matrix X , "wiiere 

X.. = -X. . , i > i (8) 

X. . = 1 , i = 1,2 ... m (9) 

11 ' 

such that 

T = 2X"^ - I • (10) 

Conversely, given any nonsingular X satisfying (8) and (S), the matrix 
T constructed from (lO) will satisfy (l). 

We can therefore regard the transformation matrix T as a function 
of a matrix X with elements satisfying (8) and (9) and minimise f with 
respect to the m(m - l)/2 free parameters ^21^ ^51^ ^52 *** ^m m 1 * 

If necessary, the matrix X corresponding to a particular T may be 
obtained from 

X = 2(1 -f T)"-^ . (11) 
Because of (7)^ orthogonal matrices such as 









1-1 




c 




1 - 1 




:) 













which result in interchanges or reflections of columns of A cannot be 



represented by (lO). This, however, does not matter since f is invariant 
under such operations. 

Gradient methods for minimising f will require first derivatives of 
f with respect to the x. . (i < j) • Using the chain rule we obtain: 



3x. . 



St'" 5x7" 



= -2tr[Z'x"^ ^ X"^] 

= 2([X"-4:'X"^]. . - [X'^Z'X"-^]..) . (12) 
This result is general. Simplification is possible when Z is 



C""^A'D AX""^ ii 

Consequently the first term in (k) may be discarded and (12) becomes 



defined by (k). It is easily verified that X" A'D AX" is symmetric. 



^ = 2([x-^-^] - [x-^«-^] . . ) (15) 
ij 



where 



This simplification reflects the fact that the first term in (5) 
remains constant when T is orthogonal. 

h. OBLIQUE ROTATION 

The m constraints in (2) may be imposed on the* m elements of T 
by expressing T as a function of 
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q = m - m = m(m - l) (ik) 
parameters . 

If T is any nonsingular matrix which satisfies (2) and which has 
nonnegative diagona.l elements, 

t.. > 0 , i = 1,2, ... m (15) 

then there is a nonsingular matrix X , with diagonal elements satisfying 
(9)^ such that 

T = X Diag2(v) (l6) 



where 



-1 -1' 

V = X X 



Conversely, given any nonsingular X , the matrix T constructed using 
(l6) will satisfy (2). 

We may therefore define the transformation matrix, T , by (l6) 
and minimise f with respect to the m(m - l) nondiagonal elements of 
X . Again, certain permutations or reflections of columns of T cannot 
be represented because of (15). Given any nonsingular T , however, it is 
always possible to interchange and reflect columns so that (15) is 
satisfied. 

The factor correlation matrix is 

1 JL 
C = Diag^2(v) V Diag^2(v) 

and, if necessary, X may be obtained from T using 

X = T Diag'-^(T) . (17) 
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The gradient is given by 



f Sf-Sr-] ( i / j ) 



= tr 

ij ' ij 



= tr 



[z'{ Diag^Cv) - X Diag"^^(V) Diag^X"^ v|j 



1 T t 1 



= [Z Diag2(v) - X"-*-' Diag(Z'X) Diag"-2(V) V].. . (l8) 
5. EXPERIENCE WITH GiWDIENT METHODS 



In employing gradient methods, the criterion for simple structure, f , 
is regarded as a function of the q free elements of the matrix X , where 
q is defined by (6) or (ik). Given X , the transformation matrix T 
is obtained from (10) or (l6), the elements, hf/dx. . , of the gradient 
are obtained from (15) or (l8) and (k)^ and f is obtained from (3) or 
from (5)« 

Two gradient methods were tried. The first vms Fjetcher's (1970) 
method, a development of the Fletcher -Powell (I963) method which seems to 
require fewer function evaluations* The second was the Polak-Ribiere 
method (Polak, 1971) which is similar to the conjugate gradient method 
of Fletcher & Reeves (196^4 )• Fletcher's method, like t\at of Fletcher & 
Powell, builds up an inverse Hessian matrix* The Fletcher -Reeves and 
Polak-Ribiere methods do not* Consequently, they require less computer 
storage than the Fletcher and Fletchrr-Powell methods but appear to con- 
verge less rapidly* When the number of factors, m , is less than six or 
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seven, there are advantages in employing an algoritnm which builds jl^ an 
inverse Hessian like that of Fletcher; when tn is large, storage con- 
siderations could require the use of the Fle'^ ^her-Reeves or Polak-Ribi^re 
methods • 

The Fletcher method was implemented by making minor changes zo a 
subroutine package prepared by Gruvaeus & JiJreskog (1970) for the FD'icher- 
Powell method. As suggcisted by Gruvaeus & Jflreskog (1970), a starting point 
for the Fletcher method was obtained by carrying out a f.w initial steepest 
descent iterations, starting with X = I • The steepest descent iteratioas 
were terminated when two consecutive iterations yielded a relative decrease 
in f of less than five percent^ 

In applications the algorithm has appeared to be satisfactory. Table 1 
shows the orthogonal and oblique factor matrices obtained fo-r* Harman's 2k 



Insert Table 1 about here 



psychological tests (Harman, I960. Table 10.10) with k = l/p (Varimay)- 
Row nonnalisation of A (e.g., Harman, I96O, p. 502) was not carried out. 
Table 2 gives the primary factor correlation matrix C for the oblique 
solution. Details of the iterations are shown in Table 3« It can 

'i ^ 

Insert Tables 2 and 5 about here 
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be seen that the oblique rotation required more iterations than the 
orthogonal rotation. This was found to be true in general and can be 
expected since more free parameters are involved in oblique rotation. 
Iteration was terminated when all elements of the gradient vector were 
less than .OOOOL x 2k in absolute value. 

In implementing the Polak-Ribi&re method (Polak^ 1971^ PP* 55-5^) 
the linear search subroutine of Gruvaeus & JQreskog (1970) was employed. 
Reinitialisation of the process with a steepest descent step was carried 
out after each set of q + 1 iterations. Inequalities (7) and (3 5) 
suggest that one should ensure, before each reinitialisation, that the 

ordering of columns T maximises , ft.j and that t. >0 , i = 1 ... 

1=1 11 11 ^ 

This was done and, if reordering or reflection of columns of T was 
necessary, X was recomputed using (ll) or (17). In oblique rotation in 
particular this step improved convergence. The Polak-Ribiere method then 
appeared to be net much slower than the Fletcher method. 

Using the same starting point and convergence criterion as those of 
the Fletcher method, the Polak-Ribiere method was applied to Harman's 
factor matrix. The rotated factor matrices yielded by the two methods 
agreed to three decimal places. It can be seen from Table 3 that the 
Polak-Ribiere method compared quite favourably with the Fletcher method 
in speed of convergence. 
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Table 1. — Harman's 2ii Psychological Tests ; 







Factor 


Pattern 


Matrices, k 










Orthogonal Rotation 






Obiiqufc P. 


■tation 




.235 


•653 


.197 


.125 


.002 


.657 


.102 


.088 


.165 




.072 


.O7I4 


.02l» 


.I129 


.007 


.05*1 


.223 


.520 


.020 


.05I4 


.068 


.5*45 


-.058 


.028 


.267 


.518 


.086 


.037 


.110 


.535 


.01*4 


.001 


.780 


t oil 

.124 


.200 


.052 


.750 


.OI45 


.1*41 


-.003 


.785 


.135 


.098 


.1I4I 


.7*46 


.056 


.001 


.107 


.8U3 


.105 


.153 


-.002 


.8141 


.036 


.092 


-.062 


•59*1 


.309 


.251* 


.OI48 


.*493 


.2'58 


.188 


-.009 


.837 


.120 


.013 


.I8I4 


.807 


.039 


-.103 


.161 


.170 


-.075 


• 71*4 


.166 


.075 


-.212 


.723 


.121) 


.218 


.065 


.627 


.290 


.069 


-.071 


.583 


.26I4 


.062 


.228 


.692 


.OllO 


-.099 


.15I4 


.708 


-.02l» 


.2I4O 


.390 


.590 


-.OII4 


.OC-J> 


.3*41 


.582 


-.092 


.261 


.017 


.196 


.11 63 


.1I46 


-.103 


.080 


.I496 


.17*4 


.128 


.107 


.11 78 


.029 


.035 


-.023 


.520 


.171 


.1*05 


.127 


.I1O8 


-.Ol4l4 


.3I49 


-.012 


.I128 


.197 


.053 


.232 


.607 


.032 


-.086 


.086 


.658 


.082 


.323 


.306 


.506 


-.157 


.227 


.172 


.532 


.189 


.22l» 


.180 


.361 


.052 


.150 


.072 


.377 


.I423 


.1*32 


.116 


.205 


.260 


.396 


.003 


.185 


.229 


.1*02 


.39*4 


.202 


.029 


.3*43 


.319 


.170 


.I135 


.371 


.0614 


.315 


.27I1 


.316 


-.07*4 


.316 


.1*32 


.526 


.2111 


.159 


.239 


.*495 


.109 


.121 


.398 


.182 


.i4 59 


.267 


.250 


.070 


.383 


.238 
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Table g. —Hamian' s 2k Psychological Tests 
Primary Factor Correlations, it = l/p • 



1.000 

.581 1.000 

.271 .295 1.000 

.550 .550 .575 1-000 



r 
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